Determining the best classifier for predicting the value of a boolean field on a blood donor database

نویسنده

  • Ritabrata Maiti
چکیده

Motivation: Thanks to digitization, we often have access to large databases, consisting of various fields of information, ranging from numbers to texts and even boolean values. Such databases lend themselves especially well to machine learning, classification and big data analysis tasks. We are able to train classifiers, using already existing data and use them for predicting the values of a certain field, given that we have information regarding the other fields. Most specifically, in this study, we look at the Electronic Health Records (EHRs) that are compiled by hospitals. These EHRs are convenient means of accessing data of individual patients, but there processing as a whole still remains a task. However, EHRs that are composed of coherent, well-tabulated structures lend themselves quite well to the application to machine language, via the usage of classifiers. In this study, we look at a Blood Transfusion Service Center Data Set (Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan). We used scikit-learn machine learning in python. From Support Vector Machines(SVM), we use Support Vector Classification(SVC), from the linear model we import Perceptron. We also used the K.neighborsclassifier and the decision tree classifiers. We segmented the database into the 2 parts. Using the first, we trained the classifiers and the next part was used to verify if the classifier prediction matched that of the actual values. Results: The test program relies on the individual testing of the classifiers. It counts the number of predictions that much the actual value and displays these counts. Using the counts, we are able to decide the best classifier for the given blood donor database. Using the most accurate models, or a collection of these models, we will be able to determine the most accurate prediction for each patient. Here, we wish to determine whether a patient had donated blood in March 2017. This prediction is a boolean value(1 or 0), where 1 denotes that the patient had donated blood and 0 denotes otherwise. Contact: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Tumor Segmentation Based on Hidden Markov Classifier using Singular Value Decomposition Feature Extraction in Brain MR images

ntroduction: Diagnosing brain tumor is not always easy for doctors, and existence of an assistant that                                                      facilitates the interpretation process is an asset in the clinic. Computer vision techniques are devised to aid the clinic in detecting tumors based on a database of tumor c...

متن کامل

شناسایی الگوی رفتار مردم در اهدای خون با استفاده از الگوریتم K-Means مبتنی بر تازگی، بسامد و ارزش خون

Introduction: Blood donation rate in developed countries is 18 times higher than developing countries. It is estimated that if only five percent of Iran population embark on blood donation, it will be adequate to meet the needs of the community. The aim of this paper is to identify the blood donators’ loyalty behavior for proper planning to extend and enhance blood donation habits among t...

متن کامل

A Method for Predicting Pile Capacity Using Cone Penetration Test Data

The massive construction in poor lands has encouraged engineers to use deep foundations in order to transfer superstructure loads to the subsoil. Since soil excavation, sampling, and laboratory testing as a part of site investigation are relatively difficult, in-situ tests such as cone penetration test (CPT) as a very informative test may be recommended. The CPT has been widely used in engineer...

متن کامل

عملکرد تاخیری گرافت و عوامل خطر مرتبط با آن در مرکز پیوند کلیه بیمارستان امام خمینی ارومیه

Background & Aims: Delayed graft function (DGF) is an important challenge in the field of kidney transplantation. Occurrence of DGF is associated with low 1st-and 5thyear graft survival rate. Reducing the incidence of DGF minimizes financial burden on the health care system and improves the quality of life for organ recipients. Although over 2000 transplantations have been performed in Imam Kha...

متن کامل

تشخیص آریتمی انقباضات زودرس بطنی در سیگنال الکتریکی قلب با استفاده ازترکیب طبقه‌بندها

Cardiovascular diseases are the most dangerous diseases and one of the biggest causes of fatality all over the world. One of the most common cardiac arrhythmias which has been considered by physicians is premature ventricular contraction (PVC) arrhythmia. Detecting this type of arrhythmia due to its abundance of all ages, is particularly important. ECG signal recording is a non-invasive, popula...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.07756  شماره 

صفحات  -

تاریخ انتشار 2017